multi-task learning problem
Single-Stage Visual Relationship Learning using Conditional Queries
Research in scene graph generation (SGG) usually considers two-stage models, that is, detecting a set of entities, followed by combining them and labeling all possible relationships. While showing promising results, the pipeline structure induces large parameter and computation overhead, and typically hinders end-to-end optimizations. To address this, recent research attempts to train single-stage models that are more computationally efficient. With the advent of DETR, a set-based detection model, one-stage models attempt to predict a set of subject-predicate-object triplets directly in a single shot. However, SGG is inherently a multi-task learning problem that requires modeling entity and predicate distributions simultaneously. In this paper, we propose Transformers with conditional queries for SGG, namely, TraCQ with a new formulation for SGG that avoids the multi-task learning problem and the combinatorial entity pair distribution. We employ a DETR-based encoder-decoder design and leverage conditional queries to significantly reduce the entity label space as well, which leads to 20% fewer parameters compared to state-of-the-art one-stage models. Experimental results show that TraCQ not only outperforms existing single-stage scene graph generation methods, it also beats state-of-the-art two-stage methods on the Visual Genome dataset, yet is capable of end-to-end training and faster inference.
Cross-Learning from Scarce Data via Multi-Task Constrained Optimization
Agorio, Leopoldo, Cerviño, Juan, Calvo-Fullana, Miguel, Ribeiro, Alejandro, Bazerque, Juan Andrés
Abstract--A learning task, understood as the problem of fitting a parametric model from supervised data, fundamentally requires the dataset to be large enough to be representative of the underlying distribution of the source. When data is limited, the learned models fail generalize to cases not seen during training. This paper introduces a multi-task cross-learning framework to overcome data scarcity by jointly estimating deterministic parameters across multiple, related tasks. We formulate this joint estimation as a constrained optimization problem, where the constraints dictate the resulting similarity between the parameters of the different models, allowing the estimated parameters to differ across tasks while still combining information from multiple data sources. This framework enables knowledge transfer from tasks with abundant data to those with scarce data, leading to more accurate and reliable parameter estimates, providing a solution for scenarios where parameter inference from limited data is critical. We provide theoretical guarantees in a controlled framework with Gaussian data, and show the efficiency of our cross-learning method in applications with real data including image classification and propagation of infectious diseases. The machine learning problem, in general, involves extracting information from a dataset, which is typically achieved by fitting the parameters of a model [1], whether it be a neural network or a more specific parametric function that incorporates additional knowledge about the data source. Once fitted, this parametric model can be used for classification, prediction, or estimation, serving various purposes.
Review for NeurIPS paper: Differentiable Neural Architecture Search in Equivalent Space with Exploration Enhancement
Weaknesses: The paper is not very novel or significant in its contribution. It compiles two regularization methods to mitigate two long-standing problems in differentiable NAS, however, the proposed methods are not very novel. NAS-Bench is not a very well established benchmark that not many people are very familiar with. It is not fair to compare with existing work on NAS-bench, as most of them were not optimized on NAS-Bench. For instance, the DARTS work may work equally well with proper hyperparameter tuning and regularization. With the existing DARTS hyperparmeters, search on NAS-bench converges to networks with only identity/skip operation.
Single-Stage Visual Relationship Learning using Conditional Queries
Research in scene graph generation (SGG) usually considers two-stage models, that is, detecting a set of entities, followed by combining them and labeling all possible relationships. While showing promising results, the pipeline structure induces large parameter and computation overhead, and typically hinders end-to-end optimizations. To address this, recent research attempts to train single-stage models that are more computationally efficient. With the advent of DETR, a set-based detection model, one-stage models attempt to predict a set of subject-predicate-object triplets directly in a single shot. However, SGG is inherently a multi-task learning problem that requires modeling entity and predicate distributions simultaneously.
Multi-task Bias-Variance Trade-off Through Functional Constraints
Cervino, Juan, Bazerque, Juan Andres, Calvo-Fullana, Miguel, Ribeiro, Alejandro
Multi-task learning aims to acquire a set of functions, either regressors or classifiers, that perform well for diverse tasks. At its core, the idea behind multi-task learning is to exploit the intrinsic similarity across data sources to aid in the learning process for each individual domain. In this paper we draw intuition from the two extreme learning scenarios -- a single function for all tasks, and a task-specific function that ignores the other tasks dependencies -- to propose a bias-variance trade-off. To control the relationship between the variance (given by the number of i.i.d. samples), and the bias (coming from data from other task), we introduce a constrained learning formulation that enforces domain specific solutions to be close to a central function. This problem is solved in the dual domain, for which we propose a stochastic primal-dual algorithm. Experimental results for a multi-domain classification problem with real data show that the proposed procedure outperforms both the task specific, as well as the single classifiers.
On Low-rank Trace Regression under General Sampling Distribution
A growing number of modern statistical learning problems involve estimating a large number of parameters from a (smaller) number of observations. In a subset of these problems (matrix completion, matrix compressed sensing, and multi-task learning) the unknown parameters form a high-dimensional matrix, and two popular approaches for the estimation are trace-norm regularized linear regression or alternating minimization. It is also known that these estimators satisfy certain optimal tail bounds under assumptions on rank, coherence, or spikiness of the unknown matrix. We study a general family of estimators and sampling distribution that include the above two estimators, and introduce a general notion of spikiness and rank for the unknown matrix. Next, we extend the existing literature on the analysis of these estimators and provide a unifying technique to prove tail bounds for the estimation error. We demonstrate the benefit of this generalization by studying its application to four problems of (1) matrix completion, (2) multi-task learning, (3) compressed sensing with Gaussian ensembles, and (4) compressed sensing with factored measurements. For (1) and (3), we recover matching tail bounds as those found in the literature, and for (2) and (4) we obtain (to the best of our knowledge) the first tail bounds. Our approach relies on a generic recipe to prove restricted strong convexity for the sampling operator of the trace regression, that only requires finding upper bounds on certain norms of the parameter matrix.
Low-Rank Deep Convolutional Neural Network for Multi-Task Learning
Su, Fang, Shang, Hai-Yang, Wang, Jing-Yan
In this paper, we propose a novel multi-task learning method based on the deep convolutional network. The proposed deep network has four convolutional layers, three max-pooling layers, and two parallel fully connected layers. To adjust the deep network to multi-task learning problem, we propose to learn a low-rank deep network so that the relation among different tasks can be explored. We proposed to minimize the number of independent parameter rows of one fully connected layer to explore the relations among different tasks, which is measured by the nuclear norm of the parameter of one fully connected layer, and seek a low-rank parameter matrix. Meanwhile, we also propose to regularize another fully connected layer by sparsity penalty, so that the useful features learned by the lower layers can be selected. The learning problem is solved by an iterative algorithm based on gradient descent and back-propagation algorithms. The proposed algorithm is evaluated over benchmark data sets of multiple face attribute prediction, multi-task natural language processing, and joint economics index predictions. The evaluation results show the advantage of the low-rank deep CNN model over multi-task problems.